Search CORE

15 research outputs found

Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search

Author: Hokamp Chris
Liu Qun
Publication venue
Publication date: 01/01/2017
Field of study

We present Grid Beam Search (GBS), an algorithm which extends beam search to allow the inclusion of pre-specified lexical constraints. The algorithm can be used with any model that generates a sequence

\mathbf{\hat{y}} = \{y_{0}\ldots y_{T}\}

, by maximizing

p(\mathbf{y} | \mathbf{x}) = \prod\limits_{t}p(y_{t} | \mathbf{x}; \{y_{0} \ldots y_{t-1}\})

. Lexical constraints take the form of phrases or words that must be present in the output sequence. This is a very general way to incorporate additional knowledge into a model's output without requiring any modification of the model parameters or training data. We demonstrate the feasibility and flexibility of Lexically Constrained Decoding by conducting experiments on Neural Interactive-Predictive Translation, as well as Domain Adaptation for Neural Machine Translation. Experiments show that GBS can provide large improvements in translation quality in interactive scenarios, and that, even without any user input, GBS can be used to achieve significant gains in performance in domain adaptation scenarios.Comment: Accepted as a long paper at ACL 201

arXiv.org e-Print Archive

Crossref

Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models

Author: Elder Henry
Hokamp Chris
Publication venue
Publication date: 01/01/2018
Field of study

This work presents a new state of the art in reconstruction of surface realizations from obfuscated text. We identify the lack of sufficient training data as the major obstacle to training high-performing models, and solve this issue by generating large amounts of synthetic training data. We also propose preprocessing techniques which make the structure contained in the input features more accessible to sequence models. Our models were ranked first on all evaluation metrics in the English portion of the 2018 Surface Realization shared task

arXiv.org e-Print Archive

Crossref

DCU-SEManiacs at SemEval-2016 task 1: synthetic paragram embeddings for semantic textual similarity

Author: Arora Piyush
Hokamp Chris
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/06/2016
Field of study

We experiment with learning word representations designed to be combined into sentence level semantic representations, using an objective function which does not directly make use of the supervised scores provided with the training data, instead opting for a simpler objective which encourages similar phrases to be close together in the embedding space. This simple objective lets us start with high quality embeddings trained using the Paraphrase Database (PPDB) (Wieting et al., 2015; Ganitkevitch et al., 2013), and then tune these embeddings using the official STS task training data, as well as synthetic paraphrases for each test dataset, obtained by pivoting through machine translation. Our submissions include runs which only compare the similarity of phrases in the embedding space, directly using the similarity score to produce predictions, as well as a run which uses vector similarity in addition to a suite of features we investigated for our 2015 Semeval submission. For the crosslingual task, we simply translate the Spanish sentences to English, and use the same system we designed for the monolingual task

DCU Online Research Access Service

Pushing the Limits of Translation Quality Estimation

Author: Astudillo Ramon
Grundkiewicz Roman
Hokamp Chris
Junczys-Dowmunt Marcin
Kepler Fabio N.
Martins André F. T.
Publication venue
Publication date: 01/07/2017
Field of study

Edinburgh Research Explorer

Findings of the 2015 Workshop on Statistical Machine Translation

Author: Bojar Ondrej
Chatterjee Rajen
Federmann Christian
Haddow Barry
Hokamp Chris
Huck Matthias
Koehn Philipp
Logacheva Varvara
Monz Christof
Negri Matteo
Post Matt
Scarton Carolina
Specia Lucia
Turchi Marco
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Biblio at Institute of Formal and Applied Linguistics

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Target-centric features for translation quality estimation

Author: Chris Hokamp
Iacer Calixto
Jian Zhang
Joachim Wagner
Publication venue
Publication date: 01/01/2014
Field of study

Abstract We describe the DCU-MIXED and DCU-SVR submissions to the WMT-14 Quality Estimation task 1.1, predicting sentencelevel perceived post-editing effort. Feature design focuses on target-side features as we hypothesise that the source side has little effect on the quality of human translations, which are included in task 1.1 of this year's WMT Quality Estimation shared task. We experiment with features of the QuEst framework, features of our past work, and three novel feature sets. Despite these efforts, our two systems perform poorly in the competition. Follow up experiments indicate that the poor performance is due to improperly optimised parameters

CiteSeerX

Crossref

DCU Online Research Access Service